System and Method for Measuring Performances of Surveillance Systems

ABSTRACT

A computer implemented method measures a performance of surveillance system. A site model, a sensor model and a traffic model are selected respectively from a set of site models, a set of sensor models, and a set of traffic models to form a surveillance model. Based on the surveillance model surveillance signals are generated. Performance of the surveillance system is evaluated according to qualitative surveillance goals and the surveillance signals to determine a value of a quantitative performance metric of the surveillance system.

FIELD OF THE INVENTION

This invention relates generally to surveillance systems, and moreparticularly to measuring performances of autonomous surveillancesystems.

BACKGROUND OF THE INVENTION

Surveillance System

A surveillance system acquires surveillance signals from an environmentin which the system operates. The surveillance signals can includeimages, video, audio and other sensor data. The surveillance signals areused to detect and identify events and objects, e.g., people, in theenvironment.

As shown in FIG. 1, a typically prior art surveillance system 10includes a distributed network of sensor 11 connected to a centralizedcontrol unit 12 via a network 13. The sensor network 11 can includepassive and active sensors, such as motion sensors, door sensors, heatsensors, fixed cameras and pan-tilt-zoom (PTZ) cameras. The control unit12 includes display devices, e.g., TV monitors, bulk storage devicessuch as VCRs, and control hardware. The control unit can process,display and store sensor data acquired by the sensor network 11. Thecontrol unit can also be involved in the operation of the active sensorsof the sensor network. The network 13 can use an internet protocol (IP).

It is desired to measure the performance of a surveillance system,particularly where the control of the sensors is automated.

Scheduling

The scheduling of active sensors, such as the PTZ cameras, impacts theperformance of surveillance systems. A number of scheduling policies areknown. However, different scheduling policies can perform differentlywith respect to the performance goals and structure of the surveillancesystem. Thus, it is important to be able to measure the performance ofsurveillance systems quantitatively with different scheduling policies.

Surveillance System Performance

Typically, automated surveillance systems have been evaluated only withrespect to their component processes, such as image-based objecttracking. For example, one can evaluate the performance of moving-objecttracking under varying conditions, including indoor/outdoor, varyingweather conditions and varying cameras/viewpoints. Standard data setsare available to evaluate and compare the performance of trackingprocesses. Image analysis procedures, such as object classification andbehavior analysis have also been tested and evaluated. However, becausenot all surveillance systems use these functions and because there is nostandard of performance measure, that approach has limited utility.

Scheduling policies have been evaluated for routing a packet in acomputer or communications network or scheduling a job in multitaskingcomputers. Each packet has a deadline and each class of packets has anassociated weight, and the goal is to minimize the weighted loss due todropped packets (a packet is dropped if it is not served by the routerbefore its deadline). However, in those applications, the serving timeusually depends only upon the server, whereas in the surveillance caseit depends upon the object itself. In the context of a videosurveillance system, “packets” correspond to objects, e.g., people,which have different serving times based on their location, motion, anddistance to the cameras. A “dropped packet” in a PTZ-based videosurveillance system corresponds to an object departing a site beforebeing observed at a high resolution by a PTZ camera. As a result, eachobject may have an estimated deadline corresponding to the time it isexpected to depart the site. Thus, computer-oriented or network-orientedscheduling evaluation cannot directly be applied to the surveillanceproblem.

Surveillance scheduling policy can also be formulated as a kinetictraveling salesperson problem. A solution can be approximated byiteratively solving time-dependent orienteering problems. However, thatwould require the assumption that the paths of surveillance targets areknown, or predictable with constant velocity and linear paths, which isunrealistic in practical applications. Moreover, it would require theassumption that the motion of a person being observed by a PTZ camera isnegligible, which is not true if the observation time, or “attentioninterval,” is long enough.

The ODVIS system supports research in tracking video surveillance. Thatsystem provides researchers the ability to prototype tracking and eventrecognition techniques using a graphical interface, C. Jaynes, S. Webb,R. Steele, and Q. Xiong, “An open development environment for evaluationof video surveillance systems,” IEEE Workshop on Performance Analysis ofVideo Surveillance and Tracking (PETS'2002), in conjunction with ECCV,June 2002. That system operates on standard data sets for surveillancesystems, e.g., the various standard PETS video, J. Ferryman.“Performance evaluation of tracking and surveillance,” EmpiricalEvaluation Methods in Computer Vision, December 2001.

Another method measures image quality for surveillance applicationsusing image fine structure and local image statistics, e.g., noise,contrast (blur vs. sharpness), color information, and clipping, KyungnamKim and Larry S. Davis, “A fine-structure image/video quality measureusing local statistics,” ICIP, pages pp. 3535-3538, 2004. That methodonly operates on real video acquired by surveillance cameras and onlyevaluates image quality. That method makes no assessment of what isgoing in, the underlying content of the video and the particular taskthat is being performed.

Virtual Surveillance

A system for generating videos of a virtual reality scene is describedby W. Shao and D. Terzopoulos, “Autonomous pedestrians,” Proc. ACMSIGGRAPH, Eurographics Symposium on Computer Animation, pp. 19-28, July2005. That system uses a hierarchical model to simulate a singlelarge-scale environment (Pennsylvania Station in New York City), and anautonomous pedestrian model. Surveillance issues are not considered.That simulator was later extended to include a human operated sensornetwork for surveillance simulation, F. Qureshi and D. Terzopoulos,“Towards intelligent camera networks: A virtual vision approach,” Proc.The Second Joint IEEE International Workshop on Visual Surveillance andPerformance Evaluation of Tracking and Surveillance, October 2005.

In later work, camera scheduling policies are described, still for thesame single Pennsylvania station environment, F. Z. Qureshi and D.Terzopoulos, “Surveillance camera scheduling: A virtual visionapproach,” ACM International Workshop on Video Surveillance and SensorNetworks, 2005. There, the camera controller is modeled as an augmentedfinite state machine. In that work, the train station is populated withvarious number of pedestrians. Then, that method determines whetherdifferent scheduling strategies detect the pedestrians or not. They donot describe generalized quantitative performance metrics. Theirperformance measurement is specific for the single task of activecameras viewing each target exactly once.

It is desired to provide a general quantitative performance metric thatcan be applied to any surveillance systems, i.e., surveillance systemswith networks of fixed cameras, manually controlled active cameras,automatically controlled fixed and active cameras, independent ofpost-acquisition processing steps, and that can be specialized toaccount for various surveillance goals.

SUMMARY OF THE INVENTION

The embodiments of the invention provide a computer implemented methodfor measuring a performance of a surveillance system. A site model, asensor model and a traffic model are selected from a set of site models,a set of sensor models, and a set of traffic models to form asurveillance model. Based on the surveillance model, surveillancesignals are generated simulation an operation of the surveillancesystem. Performance of the surveillance system is evaluated according toqualitative surveillance goals to determine a value of a quantitativeperformance metric of the surveillance system. Selecting a plurality ofthe surveillance models enables analyzing the performance of multiplesurveillance systems statistically.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a prior art surveillance system;

FIG. 2 is a block diagram of a method and system for measuring theperformance of a surveillance system according to an embodiment of theinvention;

FIG. 3 is a top view of an environment under surveillance; and

FIG. 4 is an example image generated by the system according to anembodiment of the invention for the environment of FIG. 3.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

One embodiment of our invention provides a system and method forsimulating, analyzing, and measuring a performance of a surveillancesystem. The surveillance system can include fixed cameras, pan-tilt-zoom(PTZ) cameras, and other sensors, such as audio, ultrasound, infrared,and motions sensors and can be manually or automatically controlled.

Our system generates simulated surveillance signals, much like the realworld surveillance sensor network 11 would. The signals are operated onby procedures that evaluate object detection and tracking, evaluateaction recognition, and evaluate object identification.

The signals can include video, images, and other sensor signals. Theoperation of the surveillance system can then be evaluated using ourquantitative performance metric to determine whether the surveillancesystem performs well on various surveillance goals. By using thismetric, the simulation can be used to improve the operation of asurveillance system, or to find optimal placement of sensors.

Another purpose of the embodiments of our invention is to rapidlyevaluate a large number of surveillance systems, in a completelyautomatic manner, with different assumptions at a low cost, and yetprovide meaningful results. Herein, we define a surveillance model as acombination of a site model, a traffic model and a sensor model selectedfrom a set of site, traffic and sensor models. The site, traffic andsensor models are described below. Herein, we also define a setconventionally. Generally, a set has one or more members, or none atall.

System Structure

FIG. 2 shows an embodiment of a system 20 for measuring a performance101 of a surveillance system. The surveillance system includes a controlunit 12 connected to a simulator 30 via a network 13. The simulator 30generates surveillance signals that are similar to the signals thatwould be generated by the sensor network 11 of FIG. 1.

The simulator 30 has access to sets of surveillance models 22 includinga set of site models, a set of sensor models, and a set of trafficmodels. The system also includes an evaluator 24.

Surveillance Models

In an embodiment of our invention, we simulate 30 an operation of asensor network using selected surveillance models 22 to generatesurveillance signals 31. The signals can include video, images, andother sensor signals.

The surveillance signals can be presented to the internet protocol (IP)network 13 using IP interfaces that are becoming the prominent paradigmin surveillance applications.

Our system allows us to evaluate 24 a large number of differentsurveillance system configurations automatically, under differenttraffic conditions, in a short time, and without having to invest in acostly physical plant, but using the models instead. This is done byselecting multiple instances of the surveillance models, each instanceincluding a site, sensor and traffic model.

Site Model Set

Each site model represents a specific surveillance environment, e.g., abuilding, a campus, an airport, an urban neighborhood, and the like. Ingeneral, the site models can be in the form of 2D or 3D graphic models.The site model can be generated from floor plans, site plans,architectural drawings, maps, and satellite images. The site model canhave an associated scene graph to assist the rendering procedures. Inessence, the site model is a spatial description of where thesurveillance system is to operate.

Sensor Model Set

Each sensor model represents a set of sensors that can be arranged in asite. In other words, a particular sensor model can be associated with acorresponding site model. The sensors can be fixed cameras, PTZ cameras,or other sensors, such as motion, door, audio, ultrasound, infrared,water, heat, and smoke sensors. Therefore, the sensor models indicatethe type of sensors, their optical, electrical, mechanical, and dataacquisition characteristics, and their locations. The sensors can bepassive or active. Each sensor can also be associated with a set ofscheduling policies. The scheduling policies indicate how and whensensors are used over time. For PTZ cameras, the models indicate how thecameras can be operated autonomously while detecting and trackingobjects using the scheduling policies. A sensor can be evaluated for aselected one or more of the set of scheduling policies.

Scheduling Policies

Scheduling policies can be predictive or non-predictive.

Non-Predictive Policies

“Earliest Arrival” is also known as “First Come, First Served.” Thispolicy simply selects the next target based on earliest arrival time inthe site. This policy implicitly pursues a goal of minimizing missedtargets under the assumption that objects with earlier arrivals arelikely have earlier departures. This temporal policy does not take intoconsideration any spatial information. Therefore, it cannot pursueminimizing traveling and could suffer from excess traveling.

A “Close to Far” policy is also known as “Bottom to Top,” because atypical surveillance camera is positioned high on a wall or ceiling,looking horizontally and down, making ground objects close to the cameraappear near the bottom of the image, and those far from the camera nearthe top. This policy selects die next target based on closest distanceto the bottom border of the context image, which, under the assumedgeometry, implies the closest object to the camera. This policyimplicitly pursues an objective of minimizing missed targets under theassumed geometry, because closer objects traverses the field of viewfaster than far objects. Also, depending on the exact geometry, the topof the context image may, in fact, be a very unlikely or impossiblelocation for departing targets to leave the context image.

A “Center to Periphery” is also known as “First Center.” This policyselects the next target based on closest distance to the center of acontext image taken by a wide angle camera. This policy implicitlypursues minimizing traveling cost under the assumption that most targetswill be concentrated in the center of the image, or will move towardsthe center, which often is the center of interest at a particularlocation.

A “Periphery to Center” is also known as “Last Center.” This policyselects the next target based on closest distance to the borders of thecontext image. This policy implicitly pursues minimizing missed targetsunder the assumption that targets near the borders are most likely todepart the site.

A “Nearest Neighbor” selects the next target based on closest distanceto the current attention point of the PTZ camera. This policy explicitlypursues minimizing traveling.

A “Shortest Path” policy selects the next target based on anoptimization that minimizes the overall time to observe all the targetsin the site. This policy tries to reduce the overall traveling cost ofthe PTZ cameras supposing that targets do not move.

Predictive Policies

Whereas the non-predictive policies generally implicitly optimizesurveillance goals under various assumptions, predictive policies tendto explicitly optimize these surveillance objectives. Predictivepolicies explicitly predict target departure times and PTZ travelingtimes to select the optimal target. For all of the following policies,each target's path is predicted for a number of time intervals in thefuture. Using these predicted paths along with the current pointing ofthe camera and the known speed of the camera, it is possible to predictwhere and when the PTZ camera can intersect a target path and where andwhen each target is expected to depart the site. These can be used toimplement the following predicative scheduling policies.

An “Estimated Nearest Neighbor” policy pursues minimizing travelingsimilar to the “Nearest Neighbor” policy. However, instead ofdetermining travel time using the current static locations of targets,this policy computes traveling times to each target using predictedtarget paths and speed of PTZ cameras. It selects the next target basedon shortest predicted traveling time.

An “Earliest Departure” policy pursues minimizing missed targetsexplicitly by using predicted departure times from the predicted targetpaths. It selects the next target based on earliest predicted departuretime.

A “Conditional Earliest Departure” policy is similar to the “EarliestDeparture” policy except that this policy also considers the travelingtime of the PTZ camera to the target, and will skip a target if itpredicts the PTZ camera will miss the target.

Traffic Model Set

Each traffic model represents a set of objects in a site. The objectsare associated with types, e.g., people, cars or equipment. The objectscan be static, or moving. In the later case, the objects can beassociated with trajectories. The trajectories indicate paths of theobjects, the speed of the objects, and their time of arrival anddeparture at particular locations. The traffic models can be generatedby hand, automatically, or from historical data, e.g., surveillancevideo of a site.

Simulator

The simulator 30 generates the surveillance signals using instances ofselected surveillance models. As stated above, each instance includes asite, sensor and traffic model. The simulator can apply computergraphics and animation tools to the selected models to generate thesignals. The surveillance signals can be in the form of sequences ofimages (video) or other data signals consistent with the site, sensorand traffic models. After the models have been selected the simulatoroperates completely automatically.

Evaluator

The evaluator 24 analyses the performance of the surveillance signalssystem to determine values of a performance metric as describe below.

Method Operation

The system simulates an operation of the surveillance system 20 byselecting specific instances of the models 22. To do this, the simulatorgenerates the output video for the sensors that are modeled as cameras,and perhaps, detected events for other sensors, e.g., motion activity inlocal area.

To perform the generation, the simulator can use conventional computergraphic and animation tools. For a particular camera, the simulatorrenders a scene as a video, using the site, sensor, and traffic models.

Our rendering techniques are similar to conventional techniques used invideo games and virtual reality applications, which allow a userinteract with a computer-simulated environment. Similar levels ofphotorealism can be attained with our simulator. In a simplisticimplementation, people can be rendered as avatars, more sophisticatedimplementation can render identifiable “real” people, and recognizableobjects using, perhaps, prestored video clips.

FIG. 3 is an overhead image of a site with a fixed camera 301 with awide FOV, a PTZ camera 302, and targets 303. FIG. 4 shows an image forthe fixed camera for the site shown in FIG. 3. In one embodiment, theavatars are rendered as green bodies with yellow heads against a grayishbackground to facilitate the detecting and tracking procedures.

Performance Goals

One of the goals of our system is to enable a user to better understandrelevant events and objects in an environment. For example, asurveillance system should enable a user to learn the locations,activities, and identity of people in an environment.

In qualitative terms, if a surveillance system can meet its goalscompletely, then the system is fully successful. It would be useful tohave a quantitative metric of how the system meets predeterminedqualitative performance goals. In other words, it will be useful totranslate qualitative notions of successful performance into aquantitative metric of successful performance. This is what our systemdoes.

As shown in FIG. 2, we evaluate the performance goal (and functions) ofour surveillance system using the following subgoals;

-   -   a. Knowing where each person is. (object detection and tracking)        121;    -   b. Knowing what each person is doing (action recognition) 122;        and    -   c. Knowing who each person is (object identification) 123.

The overall system performance 101 can be considered to be a weightedsum of individual performance metrics for the above subgoals

$\begin{matrix}{{{\prod{= {\sum\limits_{g \in G}{\alpha_{g}\prod_{g}}}}},{where}}{{\Pi \sim {Performance}};{\Pi \in \left\lbrack {0,1} \right\rbrack}}{G \sim {{Set}\mspace{14mu} {of}\mspace{14mu} {all}\mspace{14mu} {Goals}}}{{\Pi_{g} \sim {{Performance}\mspace{14mu} {for}\mspace{14mu} {Goal}\mspace{11mu} {‘g’}}};{\Pi_{g} \in \left\lbrack {0,1} \right\rbrack}}{{{\alpha_{g} \sim {{Weight}\mspace{14mu} {for}\mspace{14mu} {Goal}\mspace{11mu} {‘g’}}};{\alpha_{g} \geq 0}},{{\sum\limits_{g \in G}\alpha_{g}} = 1}}} & (1)\end{matrix}$

The weights can be equal. In this case, the overall performance is anaverage of the performances. For the three surveillance goals listedabove, the goal set is

G≡{track, action, id},

and we define the quantitative performance metrics as

Π_(track), Π_(action), and Π_(id).

Notions used below include:

-   -   T˜Set of all discrete time instances in a scenario    -   t˜A discrete time instance (t ε T)    -   X˜Set of all targets in a scenario    -   x˜A target (x ε X)    -   C˜Set of all cameras in the video surveillance system    -   c˜A camera (C ε C)

Generally, not all targets are present in the site all of the time. Thesurveillance system is only responsible for targets in the site.Therefore, we define a target presence function

$\begin{matrix}{{\sigma \left( {x,t} \right)} = \left\{ \begin{matrix}1 & {{if}\mspace{14mu} {target}\mspace{11mu} {‘x’}\mspace{11mu} {is}\mspace{14mu} {present}\mspace{14mu} {at}\mspace{14mu} {time}\mspace{11mu} {‘t’}} \\0 & {{otherwise},}\end{matrix} \right.} & (2)\end{matrix}$

and opportunities

O˜Set of all opportunities (x,t) to view a target,

{(x,t)|xεX,tεT,σ(x,t)=1},   (3)

which are a subset of all target-time pairs

O⊂X×T.

Relevant Pixels

In one embodiment of the invention, the quantitative metric is “relevantpixels.” We define the relevant pixels as the subset of pixels thatcontribute to an understanding of objects and events in acquiredsurveillance signals. For example, to identify a person using facerecognition, relevant pixels are the pixels of the face of the person.This requires that the face be in a field of view of the camera, andthat a plane of the face is substantially coplanar with the image planeof the camera. Thus, an image of a head facing away from camera does nothave any relevant pixels. To locate a person, perhaps all pixels of thebody are relevant, pixels in the background portion are not. Thedefinition of relevant pixels may vary from goal to goal, as describedbelow. In general, relevant pixels are associated with a target in animage taken by one of the cameras.

For each subgoal, we specify a likelihood function that expresses theprobability that the subgoal can be met for a particular target at aparticular instance in time, i.e., a single image, as a function ofrelevant pixels. In general, if no relevant pixels are acquired, thelikelihood is zero. The likelihood increases with number of relevantpixels and eventually approaches unity.

There may be a non-zero minimum number of pixels before a goal has anyrealistic chance of being achieved. Also, there is a point ofdiminishing returns in which increasing the number of relevant pixelsdoes not improve the probability of success. Thus, the likelihood versusrelevant pixels is flat at zero to some minimum number of pixelsn_(min), then increases to unity at some maximum number of pixelsn_(max) and remains flat at unity thereafter. Such a linear likelihoodfunction can have a form

$\begin{matrix}{{L(n)} = {{P\left( g \middle| n \right)} = \left\{ \begin{matrix}0 & {0 \leq n \leq n_{\min}} \\{\left( {n - n_{\min}} \right)/\left( {n_{\max} - n_{\min}} \right)} & {n_{\min} \leq n \leq n_{\max}} \\1 & {{n_{\max} \leq n},}\end{matrix} \right.}} & (4)\end{matrix}$

where

-   -   g˜Goal    -   n˜Number of relevant pixels; n≧0    -   P(g|n)˜Likelihood of ‘n’; i.e. probability of achieveing ‘g’        given ‘n’

If n_(min)=n_(max), then the likelihood function is a step function.

Quantitative Performance Metric and Qualitative Goals

We now describe our quantitative performance metrics in greater detail.Typically, a large number of simulations are performed, which can beevaluated statistically. Prior art surveillance systems do no have thiscapability of automatically evaluate a large number of differentsurveillance systems.

Evaluation

As stated above the evaluation of the performance of a surveillancesystem uses synthetic or real surveillance signals.

Evaluating Object Detection and Tracking

A 3-D location of a target is initially detected when its 2-D locationis determined in one image. Tracking performance for one target, at onetime in one camera is quantified in terms of number of pixels requiredto track a target. These are the relevant pixels. Using the abovedefined notation:

L _(track)(n(X,t,c))   (5)

as in Equation 4 with

-   -   n_(min)=Minimum number of pixels required for tracking    -   n_(max)=Maximum number of pixels required for tracking        , where    -   x˜Target    -   t˜Time    -   c˜Camera    -   n(x,t,c)˜Number of pixels of target ‘x’ in camera ‘c’ at time        ‘t’

The likelihood function is evaluated for each camera for eachopportunity. The performance metric is the normalized sum over allopportunities of the maximum over all cameras of the tracking likelihoodfunction. In our notation,

$\begin{matrix}{\prod_{track}{= {\frac{1}{O}{\sum\limits_{{({x,t})} \in O}{\max\limits_{c \in C}{{L_{track}\left( {n\left( {x,t,c} \right)} \right)}.}}}}}} & (6)\end{matrix}$

In words, each opportunity the system has to observe a target, i.e.,each discrete time that the target is present in the site, the number ofpixels of that target in each camera is used to determine the likelihoodof tracking the target from the camera. The overall likelihood oftracking the target is taken as the maximum likelihood over all cameras.This maximum likelihood is summed over all “opportunities” and this sumis normalized by the total number of opportunities to obtain theperformance metric. Note that

Π_(track ε[)0,1].

Evaluating Action Recognition

For action recognition, a higher resolution is required than fortracking and each target from multiple angles is viewed so that theentire surface of the target is acquired. We define a surface-coveragefunction

${s\left( {x,t,c,\theta} \right)} = \left\{ \begin{matrix}1 & {{{Target}\mspace{11mu} {‘x’}},} \\\; & {{surface}\mspace{14mu} {at}\mspace{14mu} {Angle}\mspace{11mu} {‘\theta ’}\mspace{14mu} {visible}\mspace{14mu} {in}\mspace{14mu} {Camera}\mspace{11mu} {‘c’}\mspace{11mu} {at}\mspace{14mu} {Time}\mspace{11mu} {‘t’}} \\0 & {{otherwise}.}\end{matrix} \right.$

If the target is a person, then the target can be modeled as a verticalcylinder for the purpose of object detection. In one embodiment, camerasare mounted on walls or ceilings with generally a horizontal view of thepeople, each vertical line on the cylindrical surface is typicallycompletely visible in a camera or completely invisible. Thus, each suchline, by its angle in the horizontal plane θ, is defined, and then, foreach surface location and each camera, whether the surface is viewableby that camera is determined.

In order to determine this, a surface-coverage function is used, whichcomputes its answer by drawing a line from the surface point to eachcamera center of projection, and determines whether that line falls inthe field of view of that camera. When simulating surveillance, thereare many ways for determining exactly how much of each target's surfaceis covered by cameras. However, for the purposes of developing a simpleformulation for performance, a cylindrical model is used, but otherscould also be applied.

The performance metric for action recognition can then be expressed as

$\begin{matrix}{{\prod_{action}{= {\frac{1}{O}{\sum\limits_{{({x,t})} \in O}{\frac{1}{2\pi}{\int_{0}^{2\pi}{{\theta}\; \underset{\; {c \in C}}{\; \max}\left( {{L_{action}\left( {n\left( {x,t,c} \right)} \right)}{s\left( {x,t,c,\theta} \right)}} \right)}}}}}}},} & (7)\end{matrix}$

where L_(action) is similar to L_(track), but with higher n_(min) andn_(max).

Evaluating Object Identification

In one embodiment of the invention, people are identified by a facerecognition subsystem. Typically, minimum requirements for facerecognition include a relatively high resolution set of pixels of theface with the face oriented within a limited range of pose with respectto the camera.

For the resolution, we can use a relevant pixel likelihood function,L_(id), following Equation 4, in which n_(min) and n_(max) are higherthan those for L_(action), and higher again than those for L_(track).The relevant pixels are only of the face of the target person, not therest of the body as for tracking and action recognition. Thus, therequired resolution is actually much higher than that required fortracking or action.

A pose function is defined as

${\Phi \left( {x,t,c} \right)} = \left\{ \begin{matrix}{1 - {\varphi/\varphi_{\max}}} & {{\varphi } \leq \varphi_{\max}} \\0 & {{{\varphi } > \varphi_{\max}},}\end{matrix} \right.$

where

-   -   φ˜Pose angle from ideal pose    -   φ_(max)˜Maximum φ allowing face recognition

A performance metric for identification by face recognition is expressedas

$\begin{matrix}{\prod_{id}{= {\frac{1}{X}{\sum\limits_{x \in X}{\max\limits_{\{{{{t \in T}|{\sigma {({x,t})}}} = 1}\}}{\max\limits_{c \in C}{\left( {{L_{id}\left( {n\left( {x,t,c} \right)} \right)}{\Phi \left( {x,t,c} \right)}} \right).}}}}}}} & (8)\end{matrix}$

In words, the total metric is the sum of a metric for each target,normalized by the number of targets. Each target, in principle, onlyrequires one good image to be identified, so we use the best one,defined by the highest product of the resolution measure (L_(id)) andthe pose measure (Φ) over all cameras over all discrete times at whichthe target is present in the site.

Lighting, occlusion, and facial expression also contribute to thesuccess of face recognition. Therefore, in practice, having multipleviews of each person is beneficial.

The performance metric is adjusted to reflect these realities indifferent embodiments, but in this particular embodiment we use theslightly idealized metric requiring just one good picture per person.

Overall Performance

The performance of the surveillance system can be evaluated individuallyfor the component performance goals or in aggregate for overallperformance. The overall relevant pixel performance metric, with equalweighting, is an average of the three performance metrics

$\prod{= {\frac{1}{3}{\left( {\prod_{track}{+ {\prod_{action}{+ \prod_{id}}}}} \right).}}}$

Other weightings can be applied in different embodiments, depending onsurveillance scenario and performance goals. For example, for testing,involving evaluation and comparison of scheduling policies, we limit oursimulations to those in which all targets are always trackable in allcameras. Therefore, we evaluate Π_(action) and Π_(id) individually, withrespect to various PTZ schedules.

Although the invention has been described by way of examples ofpreferred embodiments, it is to be understood that various otheradaptations and modifications may be made within the spirit and scope ofthe invention. Therefore, it is the object of the appended claims tocover all such variations and modifications as come within the truespirit and scope of the invention.

1. A computer implemented method for measuring a performance of asurveillance system, comprising the steps of: selecting a site model, asensor model and a traffic model respectively from a set of site models,a set of sensor models, and a set of traffic models to form asurveillance model; generating surveillance signals using thesurveillance model, in which the surveillance signal includes a sequenceof images; determining a quantitative performance metric for eachsurveillance goal in a set of qualitative surveillance goals, in whichthe quantitative performance metric is a number of relevant pixels inthe sequence of images, and in which the relevant pixels are associatedwith a target object in the sequence of images, and in which thequalitative performance goals include an object detection and trackingsubgoals an action recognition subgoal, and an object identificationsubgoal, and a likelihood function expresses a probability that thesubgoal can be met for the target object at a particular instance intime as a function of the number of relevant pixels, in which thelikelihood function has a form${L(n)} = {{P\left( g \middle| n \right)} = \left\{ \begin{matrix}0 & {0 \leq n \leq n_{\min}} \\{\left( {n - n_{\min}} \right)/\left( {n_{\max} - n_{\min}} \right)} & {n_{\min} \leq n \leq n_{\max}} \\1 & {n_{\max} \leq n}\end{matrix} \right.}$  where n is the number of pixels, g is a subgoal,n_(min) is a minimum number of relevant pixels, and n_(max) is a maximumnumber of pixels; measuring a value for each of the quantitativeperformance metrics using the surveillance signals; and evaluating aperformance of the surveillance system according to the values of thequantitative performance metrics measured from the surveillance signals.2. The method of claim 1, further comprising: forming a plurality of thesurveillance models, performing automatically the generating, and themeasuring steps for each surveillance model in the plurality of thesurveillance models to determine a plurality of the values; andanalyzing statistically the plurality of the values.
 3. The method ofclaim 2, in which a particular instance of the site model is selectedfor evaluation with a plurality of instances of the sensors models and aplurality of instances of the traffic models.
 4. The method of claim 1,in which each site model is a spatial description of where thesurveillance system is to operate.
 5. The method of claim 1, in whicheach sensor model specifies a set of sensors, and in which the set ofsensors includes a fixed camera and an active camera.
 6. The method ofclaim 5, in which each sensor is associated with a set of schedulingpolicies.
 7. The method of claim 6, in which the set of schedulingpolicies include predictive and non-predictive scheduling policies. 8.The method of claim 1, in which each traffic model includes a set ofobjects, and each object having a type and a trajectory.
 9. The methodof claim 1, in which the generating applies computer graphics andanimation techniques to the surveillance model to generate thesurveillance signals used for measuring the quantitative performancemetrics.
 10. The method of claim 1, in which the surveillance signalsinclude signals acquired from a real world surveillance system.
 11. Themethod of claim 2, in which the selecting is automated.
 12. The methodof claim 1, in which the qualitative performance goals include an objectdetection and tracking subgoal, an action recognition subgoal, and anobject identification subgoal.
 13. The method of claim 12, in which eachqualitative subgoal is associated with a corresponding quantitativeperformance metric for the qualitative subgoal.
 14. The method of claim13, in which the evaluating step weights the values of the quantitativeperformance metrics for the subgoals.
 15. The method of claim 13, inwhich the performance of the surveillance system is a weighted averageof values of the corresponding quantitative performance metrics for thequalitative subgoals.
 16. (canceled)
 17. (canceled)
 18. (canceled) 19.(canceled)
 20. (canceled)
 21. A computer implemented method formeasuring a performance of a surveillance system, comprising the stepsof: obtaining surveillance signals of a surveillance system, wherein thesurveillance signals includes a sequence of images; determining aquantitative performance metric for each surveillance goal in a set ofqualitative surveillance goals, wherein the quantitative performancemetrics are based on a number of relevant pixels in the sequence ofimages; measuring a value for each of the quantitative performancemetrics using the surveillance signals, wherein a likelihood functionexpresses a probability that the surveillance goal in a set ofqualitative surveillance goals can be met and has a form${L(n)} = {{P\left( g \middle| n \right)} = \left\{ \begin{matrix}0 & {0 \leq n \leq n_{\min}} \\{\left( {n - n_{\min}} \right)/\left( {n_{\max} - n_{\min}} \right)} & {n_{\min} \leq n \leq n_{\max}} \\1 & {{n_{\max} \leq n},}\end{matrix} \right.}$  where n is the number of pixels, g is asurveillance goal, n_(min) is a minimum number of relevant pixels, andn_(max) is a maximum number of pixels; and evaluating a performance ofthe surveillance system according to the values of the quantitativeperformance metrics.
 22. The method of claim 21, wherein the set ofqualitative surveillance goals includes an object detection and trackinggoal, an action recognition goal, and an object identification goal.